-
Notifications
You must be signed in to change notification settings - Fork 418
Let BackgroundProcessor
drive HTLC forwarding
#3891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Let BackgroundProcessor
drive HTLC forwarding
#3891
Conversation
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
Does this in any way limit users to not have delays or not have batching? Assuming that's what they want. |
On the contrary actually: it effectively reduces the (mean and min forwarding) delay quite a bit, which we can allow as we're gonna add larger receiver-side delays in the next step. And, while it get's rid of the event, users are still free to call |
ceb3335
to
9ba691c
Compare
Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable? |
LDK always processes HTLCs in batches (note that |
Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too? |
The event never featured any information so is not helpful for debugging or 'informational' purposes. Plus, it means at least 1-2 more rounds of |
But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved. Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?
I may have missed this deciding moment. If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests? |
Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.
I don't think this is generally the case, no. The 'assertion' that is mainly dropped is 'we generated an event', every thing else remains the same. |
9ba691c
to
b38c19e
Compare
This doesn't rule out a notification when there's something to forward, to at least not keep spinning when there's nothing to do? |
c1a0b35
to
d35c944
Compare
d35c944
to
c21aeab
Compare
Finished for now with the test refactoring post-dropping |
✅ Added second reviewer: @valentinewallace |
@@ -360,12 +376,24 @@ macro_rules! define_run_body { | |||
break; | |||
} | |||
|
|||
if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) { | |||
$channel_manager.get_cm().process_pending_htlc_forwards(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked a bit closer at this function. There is a lot of logic in there. Also various locks obtained.
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
🔔 2nd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
c21aeab
to
e2ad6ca
Compare
Now that we have `BackgroundProcessor` drive the batch forwarding of HTLCs, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms.
Now added a commit that adds a simple |
64cb370
to
d7387bf
Compare
pub fn is_pending_htlc_processing(&self) -> bool { | ||
let has_forward_htlcs = !self.forward_htlcs.lock().unwrap().is_empty(); | ||
let has_decode_update_add_htlcs = !self.decode_update_add_htlcs.lock().unwrap().is_empty(); | ||
let has_outbound_needing_abandon = self.pending_outbound_payments.needs_abandon(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pre-existing, but aren't we also pending processing if we have auto-retryable outbound payments, not only if they need abandoning (i.e., what we then process via check_retry_payments
)? Not sure if I'm missing something here? (cc @valentinewallace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe on main
we push a forwardable event when an HTLC fails, thus triggering the check_retry_payments
call when the event is handled. We might need to add a method to outbound_payments
to see whether any payments are is_auto_retryable_now()
.
4519c80
to
e4142d6
Compare
.. as `forward_htlcs` now does the same thing
.. as `fail_htlcs_backwards_internal` now does the same thing
We move the code into the `optionally_notify` closure, but maintain the behavior for now. In the next step, we'll use this to make sure we only repersist when necessary. Best reviewed via `git diff --ignore-all-space`
We skip repersisting `ChannelManager` when nothing is actually processed.
We add a reenatrancy guard to disallow entering `process_pending_htlc_forwards` multiple times. This makes sure that we'd skip any additional processing calls if a prior round/batch of processing is still underway.
e4142d6
to
dac90af
Compare
Can we squash? I'm also tempted to ask you to break up this PR 😅 seems the complexity has grown since it's at +3000 now |
const FALLBACK_DELAY: u16 = 50; | ||
let delay; | ||
|
||
#[cfg(feature = "std")] | ||
{ | ||
const USIZE_LEN: usize = core::mem::size_of::<usize>(); | ||
let mut random_bytes = [0u8; USIZE_LEN]; | ||
possiblyrandom::getpossiblyrandom(&mut random_bytes); | ||
|
||
let index = usize::from_be_bytes(random_bytes) % FWD_DELAYS_MILLIS.len(); | ||
delay = *FWD_DELAYS_MILLIS.get(index).unwrap_or(&FALLBACK_DELAY) | ||
} | ||
#[cfg(not(feature = "std"))] | ||
{ | ||
delay = FALLBACK_DELAY | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document why we have different behavior in no-std?
@@ -6358,13 +6357,37 @@ where | |||
} | |||
} | |||
|
|||
/// Returns whether we have pending HTLC forwards that need to be processed via | |||
/// [`Self::process_pending_htlc_forwards`]. | |||
pub fn is_pending_htlc_processing(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/is_pending_htlc_processing/needs_pending_htlc_processing? Right now to me it's ambiguous whether it means we're currently processing htlcs.
/// Handles a PendingHTLCsForwardable and HTLCHandlingFailed event | ||
macro_rules! expect_pending_htlcs_forwardable_and_htlc_handling_failed { | ||
/// Processes any HTLC forwards and handles an expected [`Event::HTLCHandlingFailed`]. | ||
macro_rules! process_htlcs_and_expect_htlc_handling_failed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the method that checks if forwarding is needed, can we revert some of the diff in tests and change expect_pending_htlcs_forwardable
to call ChannelManager::is_pending_htlc_processing
? Seems like strictly better test coverage and easier review. Matt got me overthinking the test coverage changes...
lightning/src/ln/channelmanager.rs
Outdated
$htlc.prev_hop.counterparty_node_id; | ||
let channel_id = prev_channel_id; | ||
let outpoint = prev_funding_outpoint; | ||
let htlc_id = $htlc.prev_hop.htlc_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I requested this and also prefer only pulling out some vars
Closes #3768.
Closes #1101.
Previously, we'd require the user to manually call
process_pending_htlc_forwards
as part ofPendingHTLCsForwardable
event handling. Here, we rather move this responsibility toBackgroundProcessor
, which simplifies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations.Note this also introduces batching rounds rather than calling
process_pending_htlc_forwards
individually for eachPendingHTLCsForwardable
event, which had been unintuitive anyways, as subsequentPendingHTLCsForwardable
could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, asprocess_pending_htlc_forwards
would of course handle all pending HTLCs at once.To this end, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms and drop the
PendingHTLCsForwardable
event.Draft for now as I'm still cleaning up the code base as part of the final commit droppingPendingHTLCsForwardable
.